Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Tablas y almacenamiento en windows azure
1. Windows Azure Tables and Queues Deep Dive Ing. Eduardo Castro, PhD Comunidad Windows ecastro@grupoasesor.net http://ecastrom.blogspot.com
2. Agenda Overview of Windows Azure Tables Patterns and Practices for Windows Azure Tables Overview of Windows Azure Queues Patterns and Practices for Windows Azure Queues Q&A 2
3. Fundamental Storage Abstractions Tables– Provide structured storage. A Table is a set of entities, which contain a set of properties Queues– Provide reliable storage and delivery of messages for an application Blobs – Provide a simple interface for storing named files along with metadata for the file Drives – Provides durable NTFS volumes for Windows Azure applications to use (new) 3
4. Windows Azure Tables Provides Structured Storage Massively Scalable Tables Billions of entities (rows) and TBs of data Can use thousands of servers as traffic grows Highly Available & Durable Data is replicated several times Familiar and Easy to use API ADO.NET Data Services – .NET 3.5 SP1 .NET classes and LINQ REST – with any platform or language 4
5. Table Storage Concepts Entities Tables Accounts Email =… Name = … Users Email =… Name = … moviesonline Genre =… Title = … Movies Genre =… Title = … 5
6. Table Data Model Table A storage account can create many tables Table name is scoped by account Set of entities (i.e. rows) Entity Set of properties (columns) Required properties PartitionKey, RowKey and Timestamp 6
7. Required Entity Properties PartitionKey & RowKey Uniquely identifies an entity Defines the sort order Use them to scale your application Timestamp Read only Optimistic Concurrency 7
8. PartitionKey And Partitions PartitionKey Used to group entities in the table into partitions A table partition All entities with same partition key value Unit of scale Control entity locality Row key provides uniqueness within a partition 8
9. Partitions and Partition Ranges Server A Table = Movies [Action - Comedy) Server A Table = Movies Server B Table = Movies [Comedy- Western) 9
13. Agenda Overview of Windows Azure Tables Patterns and Practices for Windows Azure Tables Overview of Windows Azure Queues Patterns and Practices for Windows Azure Queues Q & A 13
14. Key Selection: Things to Consider Scalability Distribute load as much as possible Hot partitions can be load balanced PartitionKeyis critical for scalability Query Efficiency & Speed Avoid frequent large scans Parallelize queries Entity group transactions (new) Transactions across a single partition Transaction semantics & Reduce round trips 14
15. Key Selection: Case Study 1 Table for listing all movies Home page lists movies based on chosen category 15
16. Movie Listing – Solution 1 Why do I need multiple PartitionKeys? Account name as Partition Key Movie title as RowKey since movie names need to be sorted Category as a separate property Does this scale? 16
17. Movie Listing – Solution 1 Single partition - Entire table served by one server All requests served by that single server Does not scale Client Client Request Request Request Request Server A 17
18. Movie Listing – Solution 2 All movies partitioned by category Allows system to load balance hot partitions Load distributed Better than single partition Server A Client Client Request Request Request Request Request Request Request Request Server B 18
19. Key Selection: Case Study 2 Log every transaction into a table for diagnostics Scale Write Intensive Scenario Logs can be retrieved for a given time range 19
20. Logging - Solution 1 Timestamp as Partition Key Looks like an obvious choice It is not a single partition as time moves forward Append only Requests to single partition range Load balancingdoesnot help Server may throttle Server A Applications Client Server B Request Request Request Request 20
21. Logging Solution 2 - Distribute "Append Only” Prefix timestamp such that load is distributed Id of the node logging Hash into N buckets Write load is now distributed Better throughput To query logs in time range Parallelize it across prefix values Server A Applications Client Server B Request Request Request Request 21
22. Key Selection: Query Efficiency & Speed Select keys that allow fast retrieval Reduce scan range Reduce scan frequency 22
23. Single Entity Query Where PartitionKey=‘SciFi’ and RowKey = ‘Star Trek’ Efficient processing No continuation tokens Server A Client Request Server B Result 23
24. Table Scan Query Select * from Movies where Rating > 4 Returns Continuation token 1000 movies in result set Partition range boundary Serial Processing: Wait for continuation token before proceeding Returns 1000 movies Partition range boundary hit Server A Cont. Cont. Return continuation Client Request Request Cont. Request Cont. Server B Cont. 24
25. Make Scans Faster Split “Select * from Movies where Rating > 4” into Where PartitionKey >= “A” and PartitionKey < “D” and Rating > 4 Where PartitionKey >= “D” and PartitionKey < “I” and Rating > 4 Etc. Execute in parallel Each query handles continuation Server A Cont. Cont. Request Client Request Request Server B Cont. 25
26. Query Speed Fast Single PartitionKey and RowKey with equality Medium Single partition but a small range for RowKey Entire partition or table that is small Slow Large single scan Large table scan “OR” predicates on keys => no query optimization => results in scan Expect continuation token for all except in 1 26
27. Make Queries Faster Large Scans Split the range and parallelize queries Create and maintain own views that help queries “Or” Predicates Execute individual query in parallel instead of using “OR” User Interactive Cache the result to reduce scan frequency 27
28. Expect Continuation Tokens – Seriously! Maximum of 1000 rows in a response At the end of partition range boundary Maximum of 5 seconds to execute the query 28
29. Entity Group Transactions (EGT) (new) Atomically perform multiple insert/update/deleteover entities in same partition in a single transaction Maximum of 100 commands in a single transaction and payload < 4 MB ADO.Net Data Service Use SaveChangesOptions.Batch 29
30. Key Selection: Entity Group Transaction Case Study Maintain user account information Account ID, User Name, Address, Number of rentals Maintain information of checked out rentals Account ID, Movie Title, Check out date, Due date Solution 1 – Maintain two tables – Users & Rentals Handle Cross table consistency Insert into Rentals table succeeds Update to Users table fails Queue to maintain consistency 30
31. Solution 2 Store Account Information and Rental details in same table Maintain same PartitionKey to enforce transactions Account ID as PartitionKey Update total count and Insert new rentals using Entity Group Transaction Prefix RowKey with “Kind” code: A = Account, R = Rental Row key for account info: [Kind Code]_[AccountId] Row Key for rental info: [Kind Code]_[Title] Rental Properties not set for Account row and vice versa 31
32. Best Practices & Summary Select PartitionKey and RowKey that help scale Efficient for frequently used queries Supports batch transactions Distributes load Distribute “Append only” patterns using prefix to PartitionKey Always Handle continuation tokens Client can maintain their own cache/views instead of frequent scans Future Feature - Secondary Index Execute parallel queries instead of “OR” predicates Implement back-off strategy for retries 32
33. Agenda Overview of Windows Azure Tables Patterns and Practices for Windows Azure Tables Overview of Windows Azure Queues Patterns and Practices for Windows Azure Queues Q & A 33
34. Windows Azure Queues Queue are performance efficient, highly available and provide reliable message delivery Simple, asynchronous work dispatch Programming semantics ensure that a message can be processed at least once Access is provided via REST 34
36. Account, Queues and Messages An account can create many queues Queue Name is scoped by the account A Queue contains messages No limit on number of messages stored in a queue Set a limit for message expiration Messages Message size <= 8 KB To store larger data, store data in blob/entity storage, and the blob/entity name in the message Message now has dequeue count 36
38. Queue Programming Api CloudQueueClientqueueClient = newCloudQueueClient(baseUri, credentials); CloudQueuequeue = queueClient.GetQueueReference("test1"); queue.CreateIfNotExist(); //MessageCountis populated via FetchAttributes queue.FetchAttributes(); CloudQueueMessagemessage = newCloudQueueMessage("Some content"); queue.AddMessage(message); message = queue.GetMessage(TimeSpan.FromMinutes(10) /*visibility timeout*/); //Process the message here … queue.DeleteMessage(message); 38
39. Agenda Overview of Windows Azure Tables Patterns and Practices for Windows Azure Tables Overview of Windows Azure Queues Patterns and Practices for Windows Azure Queues Q & A 39
43. Best Practices & Summary Make message processing idempotent No need to deal with failures Do not rely on order Invisible messages result in out of order Use Dequeue count to remove poison messages Enforce threshold on message’s dequeue count Use message count to dynamically increase/reduce workers Use blob to store message data with reference in message Messages > 8KB Batch messages Garbage collect orphaned blobs 43
44. Summary Table Scalable & Reliable Structured Storage System Partitioning is critical to scalability Entity Group Transactions (new) Queue Scalable & Reliable Messaging System Dequeue count returned with message (new) Use back-off strategy on retries Official Storage Client Library (new) 44